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in this paper. Given, for example, student marks on several study subjects, we may 
for a number of reasons be interested in measuring the lack of co-monotonicity (LOC) 
between the marks, which rarely follow monotone, let alone linear, patterns. For this 
purpose, in this paper we explore a novel approach based on a LOC index, which is related 
to, yet substantially different from, Eckhard Liebscher’s recently suggested coefficient of 
monotonically increasing dependence. To illustrate the new technique, we analyze a data¬ 
set of student marks on mathematics, reading and spelling. 

Keywords: association, co-monotonicity, Liebscher coefficient, LOC index, education, 
performance evaluation. 

MSC: 62H20, 62P15. 


Danang Teguh Qoyyimi: Department of Mathematics, Universitas Gadjah Mada, Yo- 
gyakarta 55281, Indonesia; and Department of Statistical and Actuarial Sciences, Univer¬ 
sity of Western Ontario, London, Ontario N6A 5B7, Canada, E-mail: dqoyyimi@uwo.ca 

* Corresponding author: Ricardas Zitikis: Department of Statistical and Actuarial 
Sciences, University of Western Ontario, London, Ontario N6A 5B7, Canada, E-mail: 
zitikis@stats.uwo.ca 


1 



1 Introduction 


The mathematical simplicity and thus interpretability of the Pearson correlation coeffi¬ 
cient have encouraged researchers to use it in a variety of areas where measuring associa¬ 
tion between variables is of interest. In many practical situations, however, we encounter 
problems that are poorly described by linear relationships and thus measuring association 
(or lack of it) using the Pearson correlation coefficient may not be prudent. A number of 
alternative ways have emerged in the literature, including the coefficients of Blomqvist, 
Gini, Kendall, and Spearman (cf., e.g., Nelsen, 2006). 

Concisely, these coefficients provide different counting and aggregation rules of concor¬ 
dant and discordant pairs of bivariate data: two pairs ( Xi,yi ) and ( Xj,yj ) are concordant 
if either Xi < Xj and y t < yj, or Xi > Xj and yt > yj. For detailed and illuminat¬ 
ing discussions of these coefficients, we refer to Section 5.1 of Nelsen (2006), where they 
are also connected with the notion of copulas. For recent methodological and applied 
developments on copulas, we refer to Jaworski et al. (2010, 2013), and references therein. 

The concordance notion leads immediately to the notion of comonotonicity that has 
deep roots in mathematics (cf. Denneberg, 1994; and references therein): Two functions 
h and g are comonotonic if and only if there are no t t and t 3 such that h(ti) < h(tj) and 
g{ti) > g(tj). This notion has turned out to be particularly useful in economics, finance, 
and insurance. For details and references on the topic, we refer to, e.g., Dhaene et al. 
(2006), and references therein. 

A number of indices for measuring dependence, concordance, and comonotonicity have 
been proposed in the literature (cf., e.g., Koch and De Schepper, 2011; Dhaene et al., 2012, 
2014; Liebscher, 2014; and references therein). All of them are concerned with different 
aspects of dependence but nevertheless - as intended by the authors - fall into a large class 
of concordance coefficients that possess certain ‘desirable’ characteristics or properties (cf., 
e.g., Schweizer and Wolff, 1981; Scarsini, 1984; Nelsen, 2006; and references therein). In 
particular, among those characteristics is a symmetry (or interchangeability, permutation, 
etc.) condition, which in the context of the present paper is not desirable and would even 
be misleading, due to the very reason that explanatory and response variables are not 
symmetric (interchangeable). Hence, for measuring the lack of, or departure from, co¬ 
monotonicity between pairs of variables, none of the aforementioned coefficients can truly 
serve our purpose. 

Nevertheless, Liebscher’s (2014) suggestion for determining whether co-movements of 
random variables follow an increasing pattern is philosophically closest to our current 
research, and we shall discuss the index briefly now, with an extensive discussion given 


only at the end of this paper, in Section O when all the required notions and notations 
have been introduced. Specifically, given a pair of random variables, say X and Y, whose 
cdf’s we denote by F and G, respectively, Liebscher’s (2014) coefficient of monotonically 
increasing dependence is 

Cx,y = l--Ep(F(X)-G(y))], (1) 

C'tfj 

where c^ — 2 — is the normalizing constant, and if can be any non-negative 

and symmetric around 0 function on the interval [—1,1] such that ^(0) = 0. Various 
properties and extensions of this index have been discussed by Liebscher (2014), from 
which we see that, to a certain degree, the index can be used for tackling the problem of 
the current paper. Yet, due to a different goal set out by Liebscher (2014), his index does 
not truly serve our needs because it is 1) symmetric with respect to X and Y as we have 
noted earlier, and 2) based on rank scatterplots, whereas our problem relics on raw-data 
scatterplots, which can be considerably different from rank-based scatterplots as we shall 
see from graphs in Section |6j 

We have organized the rest of the paper as follows. In Section [2] we describe a classical 
data-set of Thorndike and Thorndike-Christ (2010), which is of our primary interest, 
and then visualize the data using scatterplots with superimposed classical least-squares 
regression lines. In Section [3] we fit curves to bivariate data using several powerful methods 
available in the literature, which is a precursor to our use of an index for measuring lack 
of co-monotonicity (LOC). The definition and properties of the LOC index are discussed 
in Section [4j where we also provide a convenient computational formula for the index. In 
Section [5] we utilize the LOC index to analyze the data-set of Thorndike and Thorndike- 
Christ (2010). In Section IH] we discuss the difference between the LOC index and that of 
Liebscher (2014). Section [7] concludes the paper with a discussion and further references 
highlighting the importance of the topic that we research in the present paper. Some 
technicalities have been relegated to Appendix [A] 

2 Data 

To facilitate full transparency of our reasoning and adopted methodology, we use publicly 
available data of Thorndike and Thorndike-Christ (2010, pp. 24-25). The data consist of 
marks of 52 sixth grade students on three study subjects: Mathematics, Reading, and 
Spelling. The students belonged to two classes, taught by two teachers, who administered 
tests on the three subjects. For each student and for each study subject, the teachers re¬ 
ported the number of correct answers and used them to assess each student’s achievement 


on each of the three subjects. 

For our analysis, we first normalize the marks to the unit interval [0,1] by dividing the 
number of correct answers by the total number of items (i.e., questions or problems) on the 
tests: 65 items for Mathematics, 45 for Reading, and 80 for Spelling. Hence, throughout 
the paper we deal with functions h : [0,1] —> [0,1] that model association between pairs 
of study subjects, which we denote by X and Y, connected via the hypothetical equation 
y = h(x) with h estimated from data (topic of Section [3]). Summary statistics and 
histograms of the normalized marks are reported in Table [Q and Figure HJ In Figure 


Summary statistics 

Mathematics 

Reading 

Spelling 

Minimum 

0.2923 

0.4667 

0.4750 

1st quartile 

0.5077 

0.6833 

0.6375 

2nd quartile (median) 

0.5846 

0.7778 

0.7188 

3rd quartile 

0.6769 

0.8667 

0.8000 

Mean 

0.5873 

0.7654 

0.7192 

Maximum 

0.9231 

0.9778 

0.9500 

Standard deviation 

0.1373 

0.1233 

0.1129 


Table 1: Summary statistics. 





(a) Mathematics (b) Reading (c) Spelling 

Figure 1: Frequency histograms. 

M we have depicted the corresponding six scatterplots, which provide valuable insights 
into relationships between paired variables. Even though we argue that the relationships 
between the student marks on all pairs of the three study subjects are non-linear, it is 
nevertheless instructive to start our considerations with classical least-squares regression 
lines, which we have depicted in Figure [21 and values of the Pearson correlation coefficient, 
which we have reported in Table [2j 

















































(a) Mathematics-Reading (b) Mathematics-Spelling 




(c) Reading-Mathematics (d) Reading-Spelling 




(e) Spelling-Mathematics 


(f) Spelling-Reading 


Figure 2: Scatterplots and least-squares regression lines. 
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Mathematics 

Reading 

Spelling 

Mathematics 

1.000000 

0.622224 

0.146615 

Reading 

0.622224 

1.000000 

0.642215 

Spelling 

0.146615 

0.642215 

1.000000 


Table 2: Pearson correlation coefficients. 


3 Curve fitting 

Here we discuss curve fitting to scatterplots - and we have six of them (see Figure [21) - 
which is a precursor to calculating the LOC index, which is a topic of Section [4] below. 

A number of approaches have been developed for fitting curves to bivariate data. 
The parametric approach is one of them, which includes popular models such as linear, 
generalized linear, nonlinear, parametric growth curve, and many other ones (cf., e.g., 
Seber and Wild, 1989; and Panik, 2014). The disadvantage of this approach, especially 
in the context of the present paper, is that the shape and form of the functions to be 
fitted are difficult to guess, and thus involves an element of subjectivity that we want to 
avoid. Hence, we opt for the non-parametric approach, which is sometimes referred to as 
scatterplot smoothing (cf., e.g., Ruppert et ah, 1995). 

In general, there are two broad non-parametric approaches for fitting curves to bi¬ 
variate data: one is based on conditional mean and another one on conditional quantile. 
Both methods have their own advantages and disadvantages, and we shall illustrate both 
of them. We note at the outset that in the case of the conditional quantile, we shall 
restrict our attention to the conditional median that serves a natural alternative to the 
mean when data are skewed. Some further details and references on the two methods will 
be provided in Section IDTl below. with their actual use for analyzing the data of Thorndike 
and Thorndike-Christ (2010) exhibited in Section [5] 

3.1 Constructing h 

The conditional-mean approach is based on the assumption that a good model for h is 
given by the conditional mean, and thus 

h{x) = E \Y\X = x ]. (2) 

Given a scatterplot consisting of n pairs ( Xi,yi ), the local linear estimate - which is our 
choice among many other ones available in the literature - for estimating h(x) is given by 


h(x ) = A,, 






where /3q is a solution to the minimization problem 

n 

min V' L(yi - (/3 0 + /3i(xi - x))K ((x* - x)/b) ; (3) 

P0,Pl Z —^ 

1=1 

throughout this paper we work with the standard normal kernel K. Details and references 
on the bandwidth b selection will be provided in Section 13721 below. As to the loss function 
L, in the conditional-mean case we use the quadratic loss function L(x) = x 2 , which is a 
natural choice because the expected quadratic loss is minimized at the mean. In the case 
of the conditional-median approach, an analogous argument leads us toward the absolute 
loss function L(x) = |x|. 

We note in passing that this estimate naturally arises from the fact - recall here 
the local constant regression method of Nadaraya-Watson model - that h(x) defined by 
equation (J2]) solves the minimization problem K[{Y — /3o)' 2 \X — x] with respect to /3o- 
The additional quantity /3i(x, — x) in objective function ([3]) is included to diminish the 
asymptotic bias of the estimate, if compared to the bias arising from the Nadaraya-Watson 
method (cf., e.g., Fan, 1992). For further properties of the local linear estimate, we refer 
to Wand and Jones (1995), and references therein. 

It is also natural to use the conditional-quantile approach (Koenker, 2005), which is 
based on the assumption that a good model for h(x) is given by the conditional quantile, 
and thus 

h{x) = Qy\x=x(t) 

for some r E (0,1). An estimate h(x) of h(x) stems from the minimization problem of (|3j) 
using the loss function L{x) that is equal to rx for all x > 0 and (1 — r)(— x) for all x < 0. 
Upon recalling that throughout this paper we set r = 0.5, in the conditional-median case 
we therefore work with the absolute loss function L(x) = 0.5|x|; the factor 0.5 is of course 
irrelevant in our considerations as it does not influence the result of minimization problem 

©• 


3.2 Bandwidth selection 


The construction of bandwidth b is based on how good the resulting estimator h(x) of 
h{x) is, and for this task it is customary to use the mean integrated squared error (MISE) 


MISE (ft) 


E 


(ft(x) — h(x)) 2 | Xi, x 2 , • • •, x n w(x)dx 


( 4 ) 


with some weight function w that ensures convergence of the integral (e.g., Ruppert and 
Wand, 1994). Specifically, the bandwidth is chosen so that it asymptotically minimizes 
the MISE. There are of course other good ways to choose the bandwidth but we shall not 






delve deeply into this subject here and just note some of the facts that we shall utilize in 
our data-driven computations. 

Namely, we follow Ruppert and Wand (1994), Ruppert et ah (1995), and Fan and Gij- 
bels (2000) when using the conditional-mean approach. We start out with the asymptotic 
optimal bandwidth given by formula (3.21) in Fan and Gijbels (1996, p. 68). To facilitate 
its practical implementation, we use the direct plug-in method proposed by Ruppert et 
al. (1995, pp. 1262-1263). In the latter reference, the resulting bandwidth is denoted by 
hopi, which in the present paper we denote by b to avoid possible notational confusion 
with the estimate h of h. 

When using the conditional-median approach, we follow Yu and Jones (1997), who 
show that the optimal bandwidth in this case is equal to the estimate b from the conditional- 
mean approach multiplied by 

f | 1/5 

W($-wj ’ 

where r — 1/2 due to our median based approach. The 0 in the above quantity is the 
standard normal density, and <h _1 is the standard normal quantile function. Hence, in 
summary, the optimal bandwidth under the conditional-median approach is £>(7r/2) 1//5 . 


4 Measuring the lack of co-monotonicity 

In view of the above discussion, we can now assume that for any given scatterplot we have 
constructed a well-fitting function h : [0,1] —> [0,1]: if it happens to be increasing, then 
we say that the random variables X and Y have co-monotonic movements, but if not, 
then we want to assess how much the function deviates from the increasing pattern. This 
we accomplish using an index that takes value 0 when h is increasing and some positive 
value otherwise: the more the function deviates from the increasing pattern, the larger 
the value. The index, which we call the lack of co-monotonicity (LOC) index, is discussed 
next. 

4.1 The LOC index 

We start with the well-known notion of increasing rearrangement (cf., e.g., Hardy et ah, 
1952) which for our function h : [0,1] —> [0,1] is defined by 

4W = hl H x e R : G%(x) > t} 

for all t G [0,1], where G-^(x) = A{s G [0,1] : h(s) < x} and A is the Lebesque measure. 



Note 4.1 If we interpret the function h as a random variable on the probability space 
{[0,1], £>, A}, then statisticians would immediately recognize that G^(x) is the cumula¬ 
tive distribution function of h, and /^(t) is the quantile function of h. We find these 
interpretations useful to work out good intuition on the subject. 

Hence, to construct an index that would measure the lack of, or departure from, co¬ 
monotonicity between pairs of variables, we need to choose an appropriate functional 
that would couple and h in such a way that the resulting quantity would be zero if 
and only if the functions It and h coincide, that is, the fitted function h is increasing 
(to be more precise, non-decreasing). Among such candidates are the maximal distance 
between It and h, called the sup-norm, as well as the integrated distance between the two 
functions, called the Lj-norm. Though mathematically appealing, the two choices are not 
good candidates for the purpose due to the lack of so-called co-monotonic addition (to be 
explained in a moment) as has been pointed out by Qoyyimi and Zitikis (2014) in the L l 
case. 

Qoyyimi and Zitikis (2014) argue that a suitable candidate for LOC index is 

C(h)=f t(l^(t) -h(t))dt. (5) 

J o 

The integral is always non-negative, equal to 0 for every increasing function, and takes on 
(strictly) positive values for all other functions. Furthermore, C(h + d) = C{h ) for every 
real constant d, and C(ch) = cC(h) for every non-negative constant c. If g is a function 
co-monotonic with h, which means that both ?j and h increase and decrease on the same 
intervals of their joint domain of definition, then Cilj + h) = C(g) + C{h). We view this 
co-monotonicity property important for every LOC index to satisfy, and this is the reason 
we have abandoned the use of the aforementioned sup- and Li-norms. 

Given the prominent role that the notion of monotone rearrangement is playing in the 
definition of the LOC index £, it is instructive to mention that the notion has been very 
successfully used in quite a number of applications: 

• Efficient insurance contracts (e.g., Carlier and Dana, 2005; Dana and Scarsini, 2007). 

• Rank-dependent utility theory (Quiggin, 1982, 1993; also Carlier and Dana, 2003, 
2008, 2011). 

• Continuous-time portfolio selection (e.g., He and Zhou, 2011; Jin and Zhou, 2008). 

• Statistical applications such as performance improvement of estimators (e.g., Cher- 
nozhukov et ah, 2009, 2010) and optimization problems (e.g., Riischendorf, 1983). 


These are of course just a few illustrative topics and references, but they lead us into the 
vast literature on monotone rearrangements and their manifold uses. 


4.2 Computational formula 

Given its properties, the LOC index C is attractive but highly unwieldy even when h 
has a simple mathematical expression, let alone when it arises nonparametrically from a 
scatterplot. Hence, we need a simple computational method for the index even when h 
does not have an explicit mathematical expression. 

To this end, we first partition the interval [0,1] into m subintervals using the points 
i/m, i — 1,2,.. .m. Then for each i — l,2,...rawe choose any point t t e ((* — l)/m, i/m] 
for which the value r, := h(t/) is available. Hence, we have the step-wise function D m : 
[0,1] —> [0,1] defined by 


^ T\ when t — 0, 

D m (t) = { 

hi wh ™ 


( 6 ) 


whose LOC index has a very simple computational formula (proof in Appendix [All 



where T\ :m < ■■■ < r m:m are the ordered values of Ti,...,r m . Furthermore (proof in 
Appendix [All . when m —» oo, then 


£(D m ) —> C{h). 


( 8 ) 


Hence, to calculate C{h) numerically, we need to calculate C(D m ), which approximates 
C(h ) as precisely as desired provided that m is sufficiently large. 


5 Data analysis and findings 

We work with six scatterplots, and to each of them we fit two curves: one using the 
conditional-mean approach and the other one using the conditional-median approach. In 
both cases, we use the same mathematical notation h but when plotting in Figure [3l we 
use different colors to distinguish the two cases. The technicalities of curve fitting follow 
next, for which we use the R software (R Core Team, 2013). 

In the case of the conditional-mean approach, we use the local linear kernel regression 
method as discussed in Section 13.11 To aid us with computations, we use the R package 
Kernsraooth (Wand and Ripley, 2014) with the function dpill assigned for selecting the 
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Figure 3: Conditional-mean (blue) and 



(b) Mathematics-Spelling 


05 

O 



o ' 

CO 

d ’ 

03 04 615 06 07 08 616 T16~ 


(d) Reading-Spelling 



(f) Spelling-Reading 

conditional-median (red) based curves 


1 1 








optimal bandwidth and the function locpoly (with degree=l) for curve fitting. We set 
the grid size to 1,000. 

In the case of the conditional-median approach, we use the R package quantreg 
(Koenker, 2015) with the function lpqr used to obtain h with r = 1/2 and m = 1,000. 
We see from the six panels of Figure [3] that all the estimates h are more jiggly than those 
arising from the conditional-mean approach. Definitely, we can improve them with more 
work and a more sophisticated tuning of the parameters, but this would beat our purpose 
of showing that we can easily calculate the LOC index irrespective of how much irregular 
the function is. 

Based on our visual assessment, no function in Figure [3] appears to be increasing over 
its entire domain of definition. Nevertheless, we may argue that some of them are more 
increasing than others. To substantiate this claim, we employ the LOC index discussed 
in Section [4j The following terminology is useful. 

Definition 1 Given two functions 'g, h : [0,1] — > [0,1], we say that 

(1) g deviates from increasing pattern by the amount £(g); 

(2) g deviates less from increasing pattern than h when £(g) < £(h); and 

(3) pairs ( Vi,Wi ), i = 1,... ,n, exhibit less LOC than pairs ( Xi,yi ), i = 1,... ,m, when 
m < C(h ), where 'g arises from the pairs ( Vi,Wi ) and h from ( x^yi ). 

Following the guidelines of Section 14.21 we produce the step-wise approximation D m 
of the function h. Then we calculate the index C(D m ) according to formula (J7J). Findings 
in the form of ’LOC matrices’ are presented in Tables [3] and [H whose entries are the 


X 


Y 


Mathematics 

Reading 

Spelling 

Mathematics 

0.000000 

0.231814 

1.759735 

Reading 

0.007202 

0.000000 

0.097565 

Spelling 

0.855971 

0.145532 

0.000000 


Table 3: Conditional-mean based LOC matrix (entries multiplied by 1,000). 

values of the LOC index: the larger the value, the more the corresponding pairs deviate 
from the co-monotonic pattern. 

The LOC matrix is, naturally, asymmetric, and it should be such in order to match 
the asymmetry that we see in the respective paired panels of Figure [2j For example, 







X 


Y 


Mathematics 

Reading 

Spelling 

Mathematics 

0.000000 

0.286703 

0.923108 

Reading 

0.007911 

0.000000 

0.163541 

Spelling 

2.197968 

0.175055 

0.000000 


Table 4: Conditional-median based LOC matrix (entries multiplied by 1,000). 

the entry 0.231814 in Table [3] is the value (multiplied by 1,000) of the LOC index for 
Mathematics-Reading, whereas 0.007202 is the value (multiplied by 1,000) of the LOC 
index for Reading-Mathematics. We have multiplied all the original LOC-index values by 
1, 000 to avoid recording too many decimal zeros in the tables. 

Naturally, one may also wish to know how much a given study subject influences the 
other ones, which leads us in the direction of causality (cf., e.g., Pearl, 2009; and references 
therein), which at this stage of our research we want to avoid discussing. Nevertheless, 
the reader may wish to draw some conclusions from Tables [3] and El as well as from the 
scatterplots of Figure El Note that even though the corresponding entries of Tables El 
and El are different, the causality-type conclusions that we may infer from both of them 
would not contradict each other. This may not always be the case, especially if data 
are considerably skewed. In the case of the data that we are exploring, however, the 
descriptive statistics and histograms in Section El suggest fairly symmetric distributions 
of all the three study subjects. 


6 Comparing the LOC index with Liebscher’s ( 


Here we compare the LOC index £ with Liebscher’s (2014) coefficient (x,y of monoton- 
ically increasing dependence. Naturally, to understand Qxy we only need to understand 
its expectation-based part, which under the quadratic function i/j(x ) = x 2 /2 is equal to 



E 


(F(X) - G{Y )) 


2 


Note 6.1 The quantity Ix,y is closely related to the Spearman rank correlation coeffi¬ 
cient, denoted here by Sx,y, which is, by definition, equal to the Pearson correlation coef¬ 
ficient between F(X) and G(Y). Hence, we easily check the equation Sx,y = 1 — 12 I x ,y- 


Next we work with a scatterplot ( Xi,yi ), i = 1,... ,n, which we view as our ‘popu¬ 
lation.’ To avoid computational complications that inevitably arise when dealing with 


1 O 








ranks when some of the aq’s or y^s are equal, throughout the rest of this section we work 
under the assumption 

Xi 7 ^ Xj and y t ^ yj whenever i ^ j. (9) 

Note 6.2 Assumption (j9j) is violated by the data-set of Thorndike and Thorndike-Christ 
(2010). However, this is not an issue because we can always add negligible noise (e.g., 
independent and identically distributed normal random variables with means 0 and very 
small standard deviations, say 10 ” 5 ) and make all the marks unequal without practically 
changing their numerical values. 


Let F n and G n be the marginal cdf’s defined by F n {x) = Y2i=i 1{ X * — x }/ n an d 
G n (x) = 5^r=i 1 {Vi — x }/ n ■ Under this ‘hnite population’ scenario, the quantity I\,y 
becomes 


L n,x,y 


1 

2 n 


J2( F n( X i)~Gn(yi)) 2 . 


i— 1 

Let X\. n < • • • < x n:n be the ordered values of aq,... ,x n , and let y^,... ,y^ be the 
corresponding induced ordered values. In other words, the original pairs ( Xi,yi ) have 
been ordered according to their first coordinates and the resulting pairs are now (x i:n , ?/(*))• 
With the notation 


Ti = nG n (y«) 


( 10 ) 


we have 


In,x,y 


2 n ^^{ F n( x i:n) G n (jj(i ))) 


i =1 


i n / ■ \ 2 

1 r. 


E 


2n z — J \n n 
1=1 




in) 


i =1 


where we used the equation F n (xi :n ) = i/n. 

Next we construct a function h° n : [0,1] —> [0,1] such that £(h°) is equal to the 
right-hand side of equation (fill) or, in other words, such that 


£(h° n ) = I n , 


x,y 


( 12 ) 


Namely, for every i — 1,..., n, let 


Kit) 


Vi 


i — 1 i 


n 


for all t e 


n 


n 


(13) 




The LOC index of the function h[\ is 

n 


i =1 

l 

2^3 


r*z/n 


tdt 




^(i-ri)(2i - 1 ) 


i= 1 
n 


^E ( '- r <) 2 . 


( 14 ) 


1=1 


where we used the equations YH= \ * = YUi=i r i an d Yln =i = X^=i r * 2 - This establishes 
equation (fT21) and helps us to connect the LOC index £ with Liebscher’s (. 

For this, we first observe that the set of equations h° n {i/n ) = r,/n, i = 1 ,n, is 
equivalent to the set h^(F n (xi-. n )) = G n (y (*)), i = 1 ,... , n, which is in turn equivalent to 
the set of equations h^(F n (xi)) = G n (yi)), i = 1 ,n. This implies that Liebscher’s ( 
is the LOC index £ of the step-wise function which originates from the rank-based, 
scatterplot (F n (xi),G n (yi)) and not from the original scatterplot (x i:n ,y^). This also 
explains a considerable difference between the meanings of the two indices. To support 
our conclusions, we have depicted the two scenarios in Figure 01 where we have used 
Mathematics (with added small noise; recall Note 16. 2jl as the ‘explanatory’ variable and 
Reading (with added small noise) as the ‘response.’ 

Consequently, in order to decide whether the problem at hand would be better served 
by the LOC index £ or Liebscher’s £ we first need to decide whether the solution of the 
problem should rely on the original scatterplot (x i:n , y(i)), i — 1 ,..., n, or on the rank- 
based scatterplot (F n (xi :n ),G n (y^))); the latter is of course equivalent to the scatterplot 
(■ i/n,ri/n ), i — 1 ,... ,n. If the association between student rankings according to their 
marks is of primary interest, with no consideration to causality, then Liebscher’s £ is an 
appropriate index. If, however, the marks themselves are of primary interest, as is the 
case in the current paper, and keeping in mind that the marks are not interchangeable 
random variables with respect to causality, then we should rely on the original scatterplot 
(xi :n , y(i)), i — 1,.. ■, n, and use the LOC index £. 

Note, however, that at present the LOC index is available only for pairs of variables, 
which is of immediate interest for educational psychologists, whereas Liebscher’s method¬ 
ology has been extended to the multivariate case (Section 5 in Liebscher, 2014) and could 
provide further valuable insights into the problem when all study subjects are viewed as 
integral parts of one ‘study portfolio.’ 
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Figure 4: Raw and rank-based scatterplots (with added negligible noise) and fitted func¬ 
tions 


7 Concluding notes and further work 

The herein proposed index for measuring the lack of co-monotonicity (LOC) between pairs 
of variables is capable of measuring the extent to which the variables deviate from co¬ 
monotonic patterns. The LOC index is designed to work with all relationships, including 
non-linear and non-monotonic. The performance of the index has been illustrated using 
the Thorndike and Thorndike-Christ (2010) data-set consisting of student marks on three 
study subjects. 

In addition to the educational assessment problem that we have tackled in this paper, 
there are of course numerous other applications where monotonicity, or lack of it, matters, 
and we next present a few examples to illustrate the point. 

The presence of a deductible d > 0 often changes the profile of insurance losses (e.g., 
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Brazauskas et al., 2015). Because of this and other reasons, given two losses X and 
Y, which may not be observable, decision makers may wish to determine whether the 
observable losses Xd — [X \ X > d] and Yd = [Y \ Y > d] are stochastically (ST) ordered, 
say Xd <st Yd for every d > 0. ft is well known that this ordering, which is known in 
the literature as the hazard rate ordering, is equivalent to determining whether the ratio 
Sy(x)/Sx(x) of X and Y survival functions is non-decreasing in x. 

More generally, one may wish to determine whether for every deductible d > 0 and 
every policy limit L > d, the observable insurance losses X ( i l — [X \ d < X < L\ and 
Y d x — [Y | d < Y < L] are stochastically ordered. This ordering, which is known in the 
literature as the likelihood ratio ordering, is equivalent to determining whether the ratio 
fy(x)/fx(x) of X and Y density functions is non-decreasing in x. For further details on 
various stochastic orderings and their manifold uses, we refer to Shaked and Shanthikumar 
(2006), Li and Li (2013), and references therein. 

We next briefly present a few more examples and related references where monotonic¬ 
ity, or lack of it, of certain functions plays an important role: 

• Growth curves (cf., e.g., Chernozhukov et ah, 2009; Panik, 2014). 

• Mortality curves (cf., e.g., Gavrilov and Gavrilova, 1991; Bebbington et ah, 2011). 

• Positive regression dependence and risk sharing (cf., e.g., Lehmann, 1966; Dana and 
Scarsini, 2007). 

• Comonotonicity, portfolio construction, and capital allocations (cf., e.g., Dhaene et 
ah, 2006; Furman and Zitikis, 2008). 

• Decision theory and stochastic ordering (cf., e.g., Denuit et ah, 2005; Shaked and 
Shanthikumar, 2006; Li and Li, 2013). 

• Engineering reliability and risks (cf., e.g., Lai and Xie, 2006; Li and Li, 2013). 

One unifying feature of these diverse works is that they impose monotonicity require¬ 
ments on certain functions, which are generally unknown, and thus researchers may seek 
for statistical models and data for determining their shapes. To illustrate the point, we 
recall, for example, the work of Bebbington et ah (2011) who specifically set out to deter¬ 
mine whether mortality continues to increase or starts to decelerate after a certain species 
related late-life age. This is known in the literature as the late-life mortality deceleration 
phenomenon. Hence, we can rephrase the phenomenon as a question: is the mortality 
function always increasing? Naturally, we do not elaborate on this topic any further in 


this paper, referring the interested reader to Bebbington et al. (2011), and references 
therein. 

To verify the monotonicity of functions such as those noted in the above examples, 
researchers quite often assume that the functions belong to some parametric or semipara- 
metric families. One may not, however, be comfortable with this element of subjectivity 
and thus prefers to rely solely on data to make a judgement. Under these circumstances, 
verifying monotonicity becomes a non-parametric problem, whose solution asks for an 
index that, for example, takes on the value 0 when the function under consideration is 
non-decreasing and on positive values otherwise. This is exactly the topic that we have 
dealt with in the present paper. 


A Appendix: proofs 


Proof of equation |?]). We check that 


i — 1 


lit 


i=l 


and 


Hence, 


m 

1 Dm (0 = ^ 

2—1 



This concludes the proof of equation (J7J). ■ 


Proof of statement (Q|). For any two integrable functions u,v : [0,1] —y R, the intergral 
Jo | I u (t) — I v (t)\dt does not exceed Jj' | u(t) — v(t)\dt (e.g., Denneberg, 1994). Setting 
u = D m and v = h, and using the triangle inequality, we obtain the bound 

\C(D m ) - C{h )| < 2 f | D m (t) - h(t)\dt. 

J o 

Since the function h is finite and integrable, the right-hand side of the above bound 
converges to zero when m —y oo. This finishes the proof of statement (l8j) . ■ 
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